5 research outputs found
Unsupervised Machine Learning for Explainable Medicare Fraud Detection
The US federal government spends more than a trillion dollars per year on
health care, largely provided by private third parties and reimbursed by the
government. A major concern in this system is overbilling, waste and fraud by
providers, who face incentives to misreport on their claims in order to receive
higher payments. In this paper, we develop novel machine learning tools to
identify providers that overbill Medicare, the US federal health insurance
program for elderly adults and the disabled. Using large-scale Medicare claims
data, we identify patterns consistent with fraud or overbilling among inpatient
hospitalizations. Our proposed approach for Medicare fraud detection is fully
unsupervised, not relying on any labeled training data, and is explainable to
end users, providing reasoning and interpretable insights into the potentially
suspicious behavior of the flagged providers. Data from the Department of
Justice on providers facing anti-fraud lawsuits and several case studies
validate our approach and findings both quantitatively and qualitatively.Comment: Working pape
Discovery and Exploitation of Generalized Network Effects
Given a large graph with few node labels, how can we (a) identify whether
there is generalized network-effects (GNE) of the graph or not, (b) estimate
GNE to explain the interrelations among node classes, and (c) exploit GNE to
improve downstream tasks such as predicting the unknown labels accurately and
efficiently? The knowledge of GNE is valuable for various tasks like node
classification and targeted advertising. However, identifying and understanding
GNE such as homophily, heterophily or their combination is challenging in
real-world graphs due to limited availability of node labels and noisy edges.
We propose NetEffect, a graph mining approach to address the above issues,
enjoying the following properties: (i) Principled: a statistical test to
determine the presence of GNE in a graph with few node labels; (ii) General and
Explainable: a closed-form solution to estimate the specific type of GNE
observed; and (iii) Accurate and Scalable: the integration of GNE for accurate
and fast node classification. Applied on public, real-world graphs, NetEffect
discovers the unexpected absence of GNE in numerous graphs, which previously
thought to exhibit heterophily. Further, we show that incorporating GNE is
effective on node classification. On a large real-world graph with 1.6M nodes
and 22.3M edges, NetEffect achieves over 7 times speedup (14 minutes vs. 2
hours) compared to most competitors.Comment: Under Submissio
Benefit-aware Early Prediction of Health Outcomes on Multivariate EEG Time Series
Given a cardiac-arrest patient being monitored in the ICU (intensive care
unit) for brain activity, how can we predict their health outcomes as early as
possible? Early decision-making is critical in many applications, e.g.
monitoring patients may assist in early intervention and improved care. On the
other hand, early prediction on EEG data poses several challenges: (i)
earliness-accuracy trade-off; observing more data often increases accuracy but
sacrifices earliness, (ii) large-scale (for training) and streaming (online
decision-making) data processing, and (iii) multi-variate (due to multiple
electrodes) and multi-length (due to varying length of stay of patients) time
series. Motivated by this real-world application, we present BeneFitter that
infuses the incurred savings from an early prediction as well as the cost from
misclassification into a unified domain-specific target called benefit.
Unifying these two quantities allows us to directly estimate a single target
(i.e. benefit), and importantly, dictates exactly when to output a prediction:
when benefit estimate becomes positive. BeneFitter (a) is efficient and fast,
with training time linear in the number of input sequences, and can operate in
real-time for decision-making, (b) can handle multi-variate and variable-length
time-series, suitable for patient data, and (c) is effective, providing up to
2x time-savings with equal or better accuracy as compared to competitors.Comment: arxiv submissio